36 research outputs found

    Entity Coherence for Descriptive Text Structuring

    Get PDF
    Institute for Communicating and Collaborative SystemsAlthough entity coherence, i.e. the coherence that arises from certain patterns of references to entities, is of attested importance for characterising a descriptive text structure, whether and how current formal models of entity coherence such as Centering Theory can be used for the purposes of natural language generation remains unclear. This thesis investigates this issue and sets out to explore which of the many formulations of Centering best suits text structuring. In doing this, we assume text structuring to be a search task where different orderings of propositions are evaluated according to scores assigned by a metric. The main question behind this study is how to choose a metric of entity coherence among many alternatives as the only guidance to the text structuring component of a system that produces descriptions of objects. Different ways of defining metrics of entity coherence using Centering’s notions are discussed and a general corpus-based methodology is introduced to identify which of these metrics constitute the most promising candidates for search-based text structuring before the actual generation of the descriptive structure takes place. The performance of a large set of metrics is estimated empirically in a series of computational experiments using two kinds of data: (i) a reliably annotated corpus representing the genre of interest and (ii) data derived from an existing natural language generation system and ordered according to the instructions of a domain expert. A final experiment supplements our main methodology by automatically evaluating the best scoring orderings of some of the best performing metrics in comparison to an upper bound defined by orderings produced by multiple experts on additional application-specific data and a lower bound defined by a random baseline. The main findings are summarised as follows: In general, the simplest metric of entity coherence constitutes a very robust baseline for both datasets. However, when the metrics are modified according to an additional constraint on entity coherence, then the baseline is beaten in domain (ii). The employed modification is supported by the subsidiary evaluation which renders all employed metrics superior to the random baseline and helps identify the metric which overall constitutes the most suitable candidate (among the ones investigated) for search-based descriptive text structuring in domain (ii). This thesis provides substantial insight into the role of entity coherence as a descriptive text structuring constraint. Viewing Centering from an NLG perspective raises a series of interesting challenges that the thesis identifies and attempts to investigate to a certain extent. The general evaluation methodology and the results of the empirical studies are useful for any subsequent attempt to generate a descriptive text structure in the context of an application that makes use of the notion of entity coherence as modelled by Centering

    Evaluating Centering for Information Ordering Using Corpora

    Get PDF
    In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate empirically which is the most promising metric and how useful this metric is using a general methodology applied on several corpora. Our main result is that the simplest metric (which relies exclusively on NOCB transitions) sets a robust baseline that cannot be outperformed by other metrics which make use of additional centering-based features. This baseline can be used for the development of both text-to-text and concept-to-text generation systems. </jats:p

    Natural language processing in aid of FlyBase curators.

    Get PDF
    BACKGROUND: Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear. RESULTS: PaperBrowser is the first NLP-powered interface that was developed under a user-centered approach to improve the way in which FlyBase curators navigate an article. In this paper, we first discuss how observing curators at work informed the design and evaluation of PaperBrowser. Then, we present how we appraise PaperBrowser's navigational functionalities in a user-based study using a text highlighting task and evaluation criteria of Human-Computer Interaction. Our results show that PaperBrowser reduces the amount of interactions between two highlighting events and therefore improves navigational efficiency by about 58% compared to the navigational mechanism that was previously available to the curators. Moreover, PaperBrowser is shown to provide curators with enhanced navigational utility by over 74% irrespective of the different ways in which they highlight text in the article. CONCLUSION: We show that state-of-the-art performance in certain NLP tasks such as Named Entity Recognition and Anaphora Resolution can be combined with the navigational functionalities of PaperBrowser to support curation quite successfully.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Evaluating centering for sentence ordering in two new domains

    No full text
    This paper builds on recent research investigating sentence ordering in text production by evaluating the Centering-based metrics of coherence employed by Karamanis et al. (2004) using the data of Barzilay and Lapata (2005). This is the first time that Centering is evaluated empirically as a sentence ordering constraint in several domains, verifying the results reported in Karamanis et al.

    Greek nominal morphology for a classification-based Natural Language Generation system

    No full text
    This paper presents the Noun Morphological component of a Greek Natural Language Generation system which is based on a current system for English called IDAS. First, I describe the methodology that I followed in order to develop the Greek Lexicon and Grammar, using II, IDAS's classification-based knowledge representation system. Then, I show how classification is used in order to tackle the complex morphological system of the Greek noun. In my conclusion, I estimate the feasibility of using classification in order to create Natural Language Generation systems for other languages, on the basis of the existing English resources
    corecore